blue box
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
Li, Meng, Vrazitulis, Michael, Schlangen, David
Rational speakers are supposed to know what they know and what they do not know, and to generate expressions matching the strength of evidence. In contrast, it is still a challenge for current large language models to generate corresponding utterances based on the assessment of facts and confidence in an uncertain real-world environment. While it has recently become popular to estimate and calibrate confidence of LLMs with verbalized uncertainty, what is lacking is a careful examination of the linguistic knowledge of uncertainty encoded in the latent space of LLMs. In this paper, we draw on typological frameworks of epistemic expressions to evaluate LLMs' knowledge of epistemic modality, using controlled stories. Our experiments show that the performance of LLMs in generating epistemic expressions is limited and not robust, and hence the expressions of uncertainty generated by LLMs are not always reliable. To build uncertainty-aware LLMs, it is necessary to enrich semantic knowledge of epistemic modality in LLMs.
From Goal-Conditioned to Language-Conditioned Agents via Vision-Language Models
Cachet, Theo, Dance, Christopher R., Sigaud, Olivier
Vision-language models (VLMs) have tremendous potential for grounding language, and thus enabling language-conditioned agents (LCAs) to perform diverse tasks specified with text. This has motivated the study of LCAs based on reinforcement learning (RL) with rewards given by rendering images of an environment and evaluating those images with VLMs. If single-task RL is employed, such approaches are limited by the cost and time required to train a policy for each new task. Multi-task RL (MTRL) is a natural alternative, but requires a carefully designed corpus of training tasks and does not always generalize reliably to new tasks. Therefore, this paper introduces a novel decomposition of the problem of building an LCA: first find an environment configuration that has a high VLM score for text describing a task; then use a (pretrained) goal-conditioned policy to reach that configuration. We also explore several enhancements to the speed and quality of VLM-based LCAs, notably, the use of distilled models, and the evaluation of configurations from multiple viewpoints to resolve the ambiguities inherent in a single 2D view. We demonstrate our approach on the Humanoid environment, showing that it results in LCAs that outperform MTRL baselines in zero-shot generalization, without requiring any textual task descriptions or other forms of environment-specific annotation during training. Videos and an interactive demo can be found at https://europe.naverlabs.com/text2control
Text2Motion: From Natural Language Instructions to Feasible Plans
Lin, Kevin, Agia, Christopher, Migimatsu, Toki, Pavone, Marco, Bohg, Jeannette
We propose Text2Motion, a language-based planning framework enabling robots to solve sequential manipulation tasks that require long-horizon reasoning. Given a natural language instruction, our framework constructs both a task- and motion-level plan that is verified to reach inferred symbolic goals. Text2Motion uses feasibility heuristics encoded in Q-functions of a library of skills to guide task planning with Large Language Models. Whereas previous language-based planners only consider the feasibility of individual skills, Text2Motion actively resolves geometric dependencies spanning skill sequences by performing geometric feasibility planning during its search. We evaluate our method on a suite of problems that require long-horizon reasoning, interpretation of abstract goals, and handling of partial affordance perception. Our experiments show that Text2Motion can solve these challenging problems with a success rate of 82%, while prior state-of-the-art language-based planning methods only achieve 13%. Text2Motion thus provides promising generalization characteristics to semantically diverse sequential manipulation tasks with geometric dependencies between skills.
Learning Action Duration and Synergy in Task Planning for Human-Robot Collaboration
Sandrini, Samuele, Faroni, Marco, Pedrocchi, Nicola
A good estimation of the actions' cost is key in task planning for human-robot collaboration. The duration of an action depends on agents' capabilities and the correlation between actions performed simultaneously by the human and the robot. This paper proposes an approach to learning actions' costs and coupling between actions executed concurrently by humans and robots. We leverage the information from past executions to learn the average duration of each action and a synergy coefficient representing the effect of an action performed by the human on the duration of the action performed by the robot (and vice versa). We implement the proposed method in a simulated scenario where both agents can access the same area simultaneously. Safety measures require the robot to slow down when the human is close, denoting a bad synergy of tasks operating in the same area. We show that our approach can learn such bad couplings so that a task planner can leverage this information to find better plans.
Getting Connected with Google Home Using API.AI & Talend
"OK Google, what can you do when connected to Talend?" In this tutorial, I will show how to create an Agent in API.AI that will respond to commands spoken to Google Home. The Agent will reverse the words in a sentence spoken to Google Home by making use of a Talend web service which is used to carry out the word reversal. A very simple example, but it demonstrates the ground work you will need to create some really quite interesting applications. You do not need one to try this tutorial out as Google has provided an emulator, but I can highly recommend the device. Recently Google opened up access to the Actions on Google API. You can either use the Actions SDK or use API.AI. API.AI was recently acquired by Google. While API.AI is really quite simple to use, it is quite limited in how it can be used with Google Home at the moment.
Introducing CS to Newcomers, and JES As a Teaching Tool
I had an interesting experience recently. I agreed to run a session on computer science for the STEP (Science and Technology Entry Program) students at Union College's Kenney Community Center. The range of students was large, from 7th to 12th grade. Usually in a session like this I start by asking two things. First, in what ways are computers already in their lives?
Practice Programming Through Play
Solving puzzles through your own powers of thought gives a certain kind of satisfaction that is especially rewarding. Games like Sudoku, Tetris, and Rubik's Cube are great for strengthening mathematical thinking and visual-spacial intelligence. Nowadays we seem to have an endless supply of puzzle games on mobile devices to keep our minds occupied during all of the spare moments of the day. It's fine to use puzzle games to fill up the empty spaces of time, but I've found some games that entice me to go much deeper. Lately I've been getting into games geared towards introducing kids to programming concepts.